Executive Summary

The main purpose of this project was to determine the impact of applying Leading-Edge Technology in education on students’ academic behavior. To achieve this purpose kaggle public dataset was used. The analysis consists of two parts, exploratory analysis, and hypotheses testing. In the exploratory analysis part, the availability of missed data, the distribution, and the nature and possible scores of the variables in the dataset were explored using plots and other relevant functions in R. In the second part of the analysis, 4 different type/group of hypotheses were formulated and tested using the corresponding statistical methods. Accordingly, associations between students’ responsible parent and their satisfaction, parents’ satisfaction and students’ final grade, and responsible parent and students’ final grade were examined using chi-squired test. In addition, independent sample t-test was employed to examine the difference in hand raising in male and female students. The other points addressed in this part of analysis was the correlation between announcement view and participation in discussion. The final part of the analysis dealt with multiple linear regression which was used to examine the prediction of students’ hand raising by announcement view and participation in discussion. The result of the exploratory data analysis part indicated that there is no missing value in the dataset and all the quantitative variables were not normally distributed. On the other hand the results of hypothesis testing depicted that there is a statistically significant association between students’ responsible parent and their satisfaction, parents’ satisfaction and students’ final grade, and responsible parent and students’ final grade. The t-test anaysis indicated that there was a statistically significant difference in hand raising behavior between female and male students [t(478)=3.3165, p=0.0001]. The correlation test indicated that there is a statistically significant correlation between participation in discussion and announcement view [r (478)= 0.285, p <.0001]. Morever, a significant regression equation was found (F (2, 477) = 306.12, p< .0001), in predicting students’ hand raising based on visiting resources and viewing announcement. Finally, based on the results it was concluded that applying leading-edge technology in education can have positive contribution in improving students academic behavior.

About the Dataset

The dataset (“Students’ Academic Performance Dataset”) used in this project was accessed from kaggle public dataset (https://www.kaggle.com/aljarah/xAPI-Edu-Data/data). It was found in csv format (xAPI-Edu-Data.csv) with its codebook. It consists of 480 participants and 16 variables. As clearly mentioned in the introduction part of the codebook the variables in the dataset are classified into three major categories: (1) Demographic features such as gender and nationality; (2) Academic background features such as educational stage, grade Level and section; (3) Behavioral features such as raised hand on class, opening resources, answering survey by parents, and parents’ school satisfaction. The dataset has no missing value from the very beginning. Generally, the dataset consists of an educational information which is collected from learning management system (LMS), that has been designed to facilitate learning through applying leading-edge technology in education. The system provides users with a synchronous access to educational resources from any device with Internet connection.The data is collected using a learner activity tracker tool, which is a component of the training and learning architecture that enables to monitor students learning progress and their actions like reading an article or watching a training video. More details about the dataset including its codebook is variable at https://www.kaggle.com/aljarah/xAPI-Edu-Data

Load the important packages

library(tidyverse)
library(broom)
library(psych)
library(stargazer)
library(car)
library(ggfortify)

Import the dataset

given_dataset<- read_csv(file="/Users/biruk/Dropbox/Final Project in R/xAPI-Edu-Data.csv")

Exploratory Analysis

Study the variables in the dataset

Explore the nature of the variables in the dataset using summary function

summary(given_dataset)
##     gender          NationalITy        PlaceofBirth      
##  Length:480         Length:480         Length:480        
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##    StageID            GradeID           SectionID        
##  Length:480         Length:480         Length:480        
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##     Topic             Semester           Relation          raisedhands    
##  Length:480         Length:480         Length:480         Min.   :  0.00  
##  Class :character   Class :character   Class :character   1st Qu.: 15.75  
##  Mode  :character   Mode  :character   Mode  :character   Median : 50.00  
##                                                           Mean   : 46.77  
##                                                           3rd Qu.: 75.00  
##                                                           Max.   :100.00  
##  VisITedResources AnnouncementsView   Discussion    ParentAnsweringSurvey
##  Min.   : 0.0     Min.   : 0.00     Min.   : 1.00   Length:480           
##  1st Qu.:20.0     1st Qu.:14.00     1st Qu.:20.00   Class :character     
##  Median :65.0     Median :33.00     Median :39.00   Mode  :character     
##  Mean   :54.8     Mean   :37.92     Mean   :43.28                        
##  3rd Qu.:84.0     3rd Qu.:58.00     3rd Qu.:70.00                        
##  Max.   :99.0     Max.   :98.00     Max.   :99.00                        
##  ParentschoolSatisfaction StudentAbsenceDays    Class          
##  Length:480               Length:480         Length:480        
##  Class :character         Class :character   Class :character  
##  Mode  :character         Mode  :character   Mode  :character  
##                                                                
##                                                                
## 

Study missing values in the dataset

sapply(given_dataset,function(x) sum(is.na(x)))
##                   gender              NationalITy             PlaceofBirth 
##                        0                        0                        0 
##                  StageID                  GradeID                SectionID 
##                        0                        0                        0 
##                    Topic                 Semester                 Relation 
##                        0                        0                        0 
##              raisedhands         VisITedResources        AnnouncementsView 
##                        0                        0                        0 
##               Discussion    ParentAnsweringSurvey ParentschoolSatisfaction 
##                        0                        0                        0 
##       StudentAbsenceDays                    Class 
##                        0                        0

No variable in the dataset has missing value

Exclude variables not to be used in this project

performance_data<-given_dataset %>% 
 select(-PlaceofBirth, -StageID, -GradeID, -SectionID, -Semester, -ParentAnsweringSurvey, -StudentAbsenceDays)

Explore the possible scores for non-continous variables

performance_data %>% distinct(gender)
## # A tibble: 2 x 1
##   gender
##    <chr>
## 1      M
## 2      F
performance_data %>% distinct(Topic)
## # A tibble: 12 x 1
##        Topic
##        <chr>
##  1        IT
##  2      Math
##  3    Arabic
##  4   Science
##  5   English
##  6     Quran
##  7   Spanish
##  8    French
##  9   History
## 10   Biology
## 11 Chemistry
## 12   Geology
performance_data %>% distinct(Relation)
## # A tibble: 2 x 1
##   Relation
##      <chr>
## 1   Father
## 2      Mum
performance_data %>% distinct(ParentschoolSatisfaction)
## # A tibble: 2 x 1
##   ParentschoolSatisfaction
##                      <chr>
## 1                     Good
## 2                      Bad
performance_data %>% distinct(Class)
## # A tibble: 3 x 1
##   Class
##   <chr>
## 1     M
## 2     L
## 3     H
performance_data %>% distinct(NationalITy)
## # A tibble: 14 x 1
##    NationalITy
##          <chr>
##  1          KW
##  2     lebanon
##  3       Egypt
##  4 SaudiArabia
##  5         USA
##  6      Jordan
##  7    venzuela
##  8        Iran
##  9       Tunis
## 10     Morocco
## 11       Syria
## 12   Palestine
## 13        Iraq
## 14       Lybia

Inform the nature of the qualitative variables to R

subject <-factor(performance_data$Topic, levels = c("English", "Spanish", "French", "Arabic", "IT", "Math", "Chemistry", "Biology", "Science", "History", "Quran", "Geology"))
resp_parent <- factor(performance_data$Relation, levels = c("Mum", "Father"))
Parent_Sats <- ordered(performance_data$ParentschoolSatisfaction, levels = c("Bad", "Good"))
final_grade <- ordered(performance_data$Class, levels = c("L", "M", "H"))
sex <- factor(performance_data$gender, levels = c("F", "M"))
citizenship <- factor(performance_data$NationalITy,levels = c("KW", "lebanon", "Egypt", "SaudiArabia", "USA", "Jordan", "venzuela", "Iran", "Tunis", "Morocco", "Syria", "Palestine", "Iraq", "Lybia"))

Transformed the dataset so that string variables will be changed to numeric form

stud_performance_data<-performance_data %>% 
  mutate (subject = as.integer(factor(Topic, levels = c("English", "Spanish", "French", "Arabic", "IT", "Math", "Chemistry", "Biology", "Science", "History", "Quran", "Geology"))), 
          resp_parent = as.integer(factor(Relation, levels = c("Mum", "Father"))),
          Parent_Sats = as.integer(factor(ParentschoolSatisfaction, levels = c("Bad", "Good"))), 
          final_grade = as.integer(factor(Class, levels = c("L", "M", "H"))),
          citizenship = as.integer(factor(NationalITy, levels = c("KW", "lebanon", "Egypt", "SaudiArabia", "USA", "Jordan", "venzuela", "Iran", "Tunis", "Morocco", "Syria", "Palestine", "Iraq", "Lybia"))),
          sex = as.integer(factor(gender, levels = c("F", "M")))) %>% 
           as_tibble()

Study outliers

Explore outliers of the continous variables using boxplots and boxplot statistics functions

boxplot(stud_performance_data$Discussion, stud_performance_data$raisedhands, stud_performance_data$VisITedResources, stud_performance_data$AnnouncementsView, horizontal=TRUE)

As can be clearly seen from the boxplots above there is no datapoint under lower limit and above upper limit in all of the four box plots. So, there is no outliers in all of the four variables.

Besides the boxplots, boxplot statistics can also be used to check outliers. For instance, the boxplot.stats function confirmed that there is no outliers in “discussion”

boxplot.stats(stud_performance_data$Discussion)
## $stats
## [1]  1 20 39 70 99
## 
## $n
## [1] 480
## 
## $conf
## [1] 35.39416 42.60584
## 
## $out
## integer(0)

Check for normality of quantitative variables in the dataset

plot (density(stud_performance_data$raisedhands))

plot(density(stud_performance_data$Discussion))

hist(stud_performance_data$VisITedResources)

hist(stud_performance_data$AnnouncementsView)

As can be seen from the plots above, the distribution of variables “raisedhands”, “Discussion”, “VisITedResources” and “AnnouncementsView” are not normal.

Visualization of the variables in the dataset

The composition of student participants in terms of the Subjects/courses can be indicated using pie plot.

pie(table(stud_performance_data$Topic))

The composition of student participants in terms of their citizenship can be communicated using bar plot

barplot(table(stud_performance_data$NationalITy), xlab = 'participants citizenship', ylab = 'Number Of Participants')

Hypotheses Testing

Test for association

Is there association between students’ responsible parent and their satisfaction?

table(resp_parent, Parent_Sats)
##            Parent_Sats
## resp_parent Bad Good
##      Mum     44  153
##      Father 144  139
chisq.test(table(resp_parent, Parent_Sats))
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  table(resp_parent, Parent_Sats)
## X-squared = 38.541, df = 1, p-value = 5.363e-10

The chi-squared test result indicated that there is a satatistically significant association between students’ responsible parent for students education and their satisfaction (??2 (1, N = 480) = 38.541, p < .0001).

plot the result

plot(resp_parent, Parent_Sats, ylab = "Parents' satisfaction", xlab = "Responsible parent for students education")

The plot communicates that larger proportion of participants in mothers group have good satisfaction than fathers group

Is there association between parents’ satisfaction and their children’s final grade score?

table(Parent_Sats, final_grade)
##            final_grade
## Parent_Sats   L   M   H
##        Bad   84  80  24
##        Good  43 131 118
chisq.test(table(Parent_Sats, final_grade))
## 
##  Pearson's Chi-squared test
## 
## data:  table(Parent_Sats, final_grade)
## X-squared = 68.47, df = 2, p-value = 1.355e-15

Chi-squire test depicted that there was a statistically significant association between patents’ satisfaction and their children’s final grade score (??2 (2, N = 480) = 68.47, p < .0001).

Now show the result in plot

plot(Parent_Sats, final_grade, ylab = "Student's final grade score", xlab = "Parents satisfaction")

The plot showed that larger portion of students from parents who have good satisfaction scored better grade than students whose parents have bad satisfaction.

Test for group difference

Is there difference in studetns’ hand raising as a function of their sex?

Examine the assumptions for parametric test to decide the statistical test to be used. In this regard, independence of observations is automatic as the two observations are completely independent (male and female).

Do box plot to examine the equivalence of variance between the two groups

boxplot(stud_performance_data$raisedhands~sex)

The boxplot indicates that the variances of the two groups are almost the same. This can also be verified usng Levene’s Test

leveneTest(stud_performance_data$raisedhands~sex) 
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value Pr(>F)
## group   1  0.9706  0.325
##       478

Levene’s Test confirmed that the difference of the variance of the two groups is not significant.

Now conduct t-test to verify/refute the hypothesis

t.test(stud_performance_data$raisedhands~stud_performance_data$sex, mu=0, alt="two.sided", conf=0.95, var.eq=T, paired=F)
## 
##  Two Sample t-test
## 
## data:  stud_performance_data$raisedhands by stud_performance_data$sex
## t = 3.3165, df = 478, p-value = 0.0009809
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   3.904502 15.257278
## sample estimates:
## mean in group 1 mean in group 2 
##        52.86286        43.28197

To include SD statistics in the report calculate it

sd(stud_performance_data$raisedhands[sex=="F"])
## [1] 30.21805
sd(stud_performance_data$raisedhands[sex=="M"])
## [1] 30.60217

So, an independent-samples t-test was conducted to compare students hand raising as a function of their sex. The result indicated that there was a significant difference in hand raising if male (M=43.282, SD=30.602) and female (F=52.863, SD=30.218) students [t(478)=3.3165, p=0.0001].

Correlation

Is there a statistical significant relationship between announcement view and participation in discussion?

In the exploratory analysis part it was confirmed that the distribution of both announcement view and discussion were not normal.

So, Check the the transform of the scores if it helps for normality

log_Discussion<-log(stud_performance_data$Discussion)
qqnorm(log_Discussion, col='blue')
qqline(log_Discussion, col ="red")

log_AnnouncementsView<-log(stud_performance_data$AnnouncementsView + 1) 
qqnorm(log_AnnouncementsView, col='blue')
qqline(log_AnnouncementsView, col ="red")

Still the transformation of the scores in both variables didn’t help them for normality

Plot to examine monotonic relationship between the two variables to use Kendall’s tau.

plot(stud_performance_data$Discussion, stud_performance_data$AnnouncementsView, main="scatterplot", las=1)

The plot shows that it fulfilled the assumption of monotonic relationship

Now conduct correlation test

cor.test(stud_performance_data$Discussion, stud_performance_data$AnnouncementsView, method="kendall")
## 
##  Kendall's rank correlation tau
## 
## data:  stud_performance_data$Discussion and stud_performance_data$AnnouncementsView
## z = 9.1773, p-value < 2.2e-16
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
##       tau 
## 0.2851579

plot the result by using different color for gender just for the sake of enhancing informativity of the plot

stud_performance_data %>% 
  ggplot() +
  aes(x = AnnouncementsView, y = Discussion) +
  geom_point(aes(color = gender), size = 3) +
  geom_smooth(method = "lm")

A Kendall’s tau coefficient was computed to assess the relationship between the announcement view and students’ participation in discussion. The result depicted that there is a significant correlation between the two variables [r (478)= 0.285, p <.0001].

Regression Analysis

  • Do resource visiting and announcement viewing significantly predict students hand raising?

The relationship among the variables used in the hypothesis was studied using plot matrix (used to check the linearity of relationship between IVs with DV)

plot(stud_performance_data[5:7], pch=16, col="blue", main="Matrix Scatterplot of raisedhands, VisITedResources, AnnouncementsView")

As can be seen from the plot matrix the assumption of linear relationship between IVS and DV is fulfilled.

The next step could be to check for multicollinearity of independent variables. It would be better to start this process by looking at correlation matrix among the variables

library(corrplot)
check_cor = cor(stud_performance_data[5:7])
corrplot(check_cor, method = "number")

As can be seen from the correlation matrix the correlation between the two independent variables is 0.59, which is below the cutoff (.8)

Fit a Linear model and continue testing its assumptions

lm1<-lm(raisedhands~VisITedResources + AnnouncementsView, data=stud_performance_data)
summary(lm1)
## 
## Call:
## lm(formula = raisedhands ~ VisITedResources + AnnouncementsView, 
##     data = stud_performance_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -57.254 -11.179   0.393  12.918  62.537 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        6.63719    1.87488   3.540 0.000439 ***
## VisITedResources   0.44433    0.03506  12.673  < 2e-16 ***
## AnnouncementsView  0.41641    0.04358   9.554  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.41 on 477 degrees of freedom
## Multiple R-squared:  0.5621, Adjusted R-squared:  0.5602 
## F-statistic: 306.1 on 2 and 477 DF,  p-value: < 2.2e-16

Now assess multicollinearity using the variance inflaction factor (VIF)

car::vif(lm1)
##  VisITedResources AnnouncementsView 
##          1.546624          1.546624

The values look ok as they are not very large. However as the average of VIF is larger than 1, it seems that multicollineratity is a bit biasing the model.

mean(car::vif(lm1))
## [1] 1.546624

So, look at the tolerance.

1/car::vif(lm1)
##  VisITedResources AnnouncementsView 
##         0.6465697         0.6465697

The tolerance value seems it is good as it is in the range of 0 and 1.

Now, assess the independence of residuals

car::dwt(lm1)
##  lag Autocorrelation D-W Statistic p-value
##    1       0.2779508      1.443303       0
##  Alternative hypothesis: rho != 0

The result indicates that there seems autocorrelation between the the two independent variables, so the residuals may not be independent.

To check heteroscetasticity inspect the residual diagnostic plots.

autoplot(lm1, which = 1:6, label.size = 3)

From the plot it is possible to observe that residuals are randomly distributed around regression line. Besides, from the Q-Q plot, it seems that residuals follow normal distribution. So, residuals in the this model have passed the test of Normality.

The scale-location plot is a bit bent from the horizonal ideal line and but still it would approximatley indicate that residuals have uniform variance across the range.

Examine the outliers if there is concrete reason to eliminate theme. There is no concrete reason to eliminate them.

stud_performance_data %>% 
  slice(c(96, 178, 187, 345, 382))
## # A tibble: 5 x 16
##   gender NationalITy  Topic Relation raisedhands VisITedResources
##    <chr>       <chr>  <chr>    <chr>       <int>            <int>
## 1      F          KW     IT   Father         100               80
## 2      F         USA French      Mum          15               52
## 3      M          KW Arabic      Mum          85               15
## 4      F      Jordan French      Mum          14               97
## 5      F      Jordan Arabic   Father          10               12
## # ... with 10 more variables: AnnouncementsView <int>, Discussion <int>,
## #   ParentschoolSatisfaction <chr>, Class <chr>, subject <int>,
## #   resp_parent <int>, Parent_Sats <int>, final_grade <int>,
## #   citizenship <int>, sex <int>

Fit the second model which consists of both predictors and their interaction together.

lm2<-lm(raisedhands~VisITedResources*AnnouncementsView, data=stud_performance_data)
summary(lm2)
## 
## Call:
## lm(formula = raisedhands ~ VisITedResources * AnnouncementsView, 
##     data = stud_performance_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -57.386 -11.488  -0.135  12.794  62.720 
## 
## Coefficients:
##                                    Estimate Std. Error t value Pr(>|t|)
## (Intercept)                        7.741111   2.690976   2.877  0.00420
## VisITedResources                   0.423887   0.050066   8.467 3.16e-16
## AnnouncementsView                  0.359233   0.109016   3.295  0.00106
## VisITedResources:AnnouncementsView 0.000840   0.001468   0.572  0.56741
##                                       
## (Intercept)                        ** 
## VisITedResources                   ***
## AnnouncementsView                  ** 
## VisITedResources:AnnouncementsView    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.43 on 476 degrees of freedom
## Multiple R-squared:  0.5624, Adjusted R-squared:  0.5596 
## F-statistic: 203.9 on 3 and 476 DF,  p-value: < 2.2e-16

The p value indicates that the interaction between the two independent variables doesn’t significantly contribute to the model. Besides, in the model, compared with model 1 above, the F-Statistic fall down from 306.1 to 203.9. However, no substantial change was seen in residual standard error and adjusted R-square value.

The VIF of the model indicated that the interaction between the two variables has a score of 15.007, which is larger than 10. So, multicollinearity is an issue here.

car::vif(lm2)
##                   VisITedResources                  AnnouncementsView 
##                           3.149303                           9.662806 
## VisITedResources:AnnouncementsView 
##                          15.007122

In addition, the the average of VIF score (9.273) confirms that multicollineratity is biasing the model.

mean(car::vif(lm2))
## [1] 9.273077

The tolerance also indicates that except visiting resources the other two variables (announcement and interaction between announcement view and visiting resource) have not tolerable.

1/car::vif(lm2)
##                   VisITedResources                  AnnouncementsView 
##                         0.31753057                         0.10348960 
## VisITedResources:AnnouncementsView 
##                         0.06663503

Measure the independence of residuals

car::dwt(lm2)
##  lag Autocorrelation D-W Statistic p-value
##    1       0.2785941      1.441987       0
##  Alternative hypothesis: rho != 0

It seems that the model has some significant autocorrelation, so the residuals are not independent. To check heteroscetasticity inspect the residual diagnostic plots.

autoplot(lm2, which = 1:6, label.size = 3)

Fit the third model which includes only the interaction between the two predictors and continue testing linear regression model assumptions

lm3<-lm(raisedhands~VisITedResources:AnnouncementsView, data=stud_performance_data)
summary(lm3)
## 
## Call:
## lm(formula = raisedhands ~ VisITedResources:AnnouncementsView, 
##     data = stud_performance_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -61.828 -16.113  -3.039  13.605  74.693 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)
## (Intercept)                        2.390e+01  1.453e+00   16.45   <2e-16
## VisITedResources:AnnouncementsView 8.798e-03  4.059e-04   21.67   <2e-16
##                                       
## (Intercept)                        ***
## VisITedResources:AnnouncementsView ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21.88 on 478 degrees of freedom
## Multiple R-squared:  0.4956, Adjusted R-squared:  0.4946 
## F-statistic: 469.7 on 1 and 478 DF,  p-value: < 2.2e-16

In this model the p value indicates that the interaticon of the independent variables significantly contribute to the model.

Assess the independence of residuals in the model.

car::dwt(lm3)
##  lag Autocorrelation D-W Statistic p-value
##    1       0.2484915       1.50235       0
##  Alternative hypothesis: rho != 0

The result indicates that there seems autocorrelation between the the two independent variables, so the residuals may not be independent again.

To check heteroscetasticity inspect the residual diagnostic plots of the model.

autoplot(lm3, which = 1:6, label.size = 3)

Compared to the first model, in this model adjusted R-squared value gets down while residual standard error almost stayed the same. On the other hand, the F-statistic get improved from 306.1 to 469.7 almost for the same degree of freedom.

Now, let’s fix the confidence intervals for parameters.

confint(lm1, level = 0.95)
##                       2.5 %     97.5 %
## (Intercept)       2.9531474 10.3212263
## VisITedResources  0.3754325  0.5132187
## AnnouncementsView 0.3307687  0.5020486
confint(lm2, level = 0.95)
##                                           2.5 %       97.5 %
## (Intercept)                         2.453449404 13.028772138
## VisITedResources                    0.325508381  0.522265018
## AnnouncementsView                   0.145020815  0.573445233
## VisITedResources:AnnouncementsView -0.002044292  0.003724304
confint(lm3, level = 0.95)
##                                           2.5 %       97.5 %
## (Intercept)                        21.044274119 26.754756871
## VisITedResources:AnnouncementsView  0.008000253  0.009595484

Model selection
compare the models using broom::glance()

glance(lm1)
##   r.squared adj.r.squared    sigma statistic      p.value df    logLik
## 1 0.5620764     0.5602403 20.41105  306.1156 2.975109e-86  3 -2127.303
##        AIC    BIC deviance df.residual
## 1 4262.605 4279.3 198723.5         477
glance(lm2)
##   r.squared adj.r.squared    sigma statistic      p.value df    logLik
## 1 0.5623775     0.5596194 20.42546  203.8985 4.998754e-85  4 -2127.137
##        AIC      BIC deviance df.residual
## 1 4264.275 4285.144 198586.8         476
glance(lm3)
##   r.squared adj.r.squared    sigma statistic      p.value df    logLik
## 1 0.4956469     0.4945917 21.88159  469.7486 4.646176e-73  2 -2161.198
##        AIC      BIC deviance df.residual
## 1 4328.396 4340.918 228868.2         478

Based on the principle to choose a model with the smallest logLik, AIC, and BIC with the same df, the first model (lm1) is selected.

To confirm model selection annova function can be used

anova(lm1, lm2)
## Analysis of Variance Table
## 
## Model 1: raisedhands ~ VisITedResources + AnnouncementsView
## Model 2: raisedhands ~ VisITedResources * AnnouncementsView
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1    477 198723                           
## 2    476 198587  1    136.63 0.3275 0.5674
anova(lm1, lm3)
## Analysis of Variance Table
## 
## Model 1: raisedhands ~ VisITedResources + AnnouncementsView
## Model 2: raisedhands ~ VisITedResources:AnnouncementsView
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1    477 198723                                  
## 2    478 228868 -1    -30145 72.357 2.334e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Anova test for model comparision showed that there is no significant difference between the first and the second models(F=0.5674, df=1).

F value indicated that there is significant difference between the first and the third model in favor of the first. Threfore, anova analysis aslo comfirmed that the first model is the winner.

Normal distribution of resuduals in the winner model can be confirmed again as follows

stud_performance_data %>% 
  augment(lm(raisedhands~VisITedResources + AnnouncementsView, data = .), .) %>% 
  ggplot() +
  aes(.resid) +
  geom_histogram(bins = 10)

The residuals of the selected model are approximatelly normally distributed!

This can be also confirmed using statistical method

stud_performance_data %>% 
  augment(lm(raisedhands ~ VisITedResources*AnnouncementsView, data = .), .) %>% 
  pull(.resid) %>% 
  shapiro.test(.)
## 
##  Shapiro-Wilk normality test
## 
## data:  .
## W = 0.99443, p-value = 0.07819

The Shapiro-Wilks test also confirmed that the residuals are normally distributed (p=0.07819). Theforfore, the model is valid.

Explore the jointed effect of visiting resource and announcement vewing on handraising using coplot function.

coplot(raisedhands~VisITedResources|AnnouncementsView, panel = panel.smooth, stud_performance_data)

By loading important packages plot the result of the model in 3D.

library(plotly)
plot_ly(stud_performance_data, y= ~stud_performance_data$raisedhands, x= ~stud_performance_data$AnnouncementsView, z= ~stud_performance_data$VisITedResources)
## No trace type specified:
##   Based on info supplied, a 'scatter3d' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#scatter3d
## No scatter3d mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode

Now, report the results of multiple regression anaysis in table form.

library(stargazer) 
library(knitr)
  stargazer(lm1,
            lm2,
            lm3,
            coef = list(lm1$standardized.coefficients,
                        lm2$standardized.coefficients,
                        lm3$standardized.coefficients),
            title = "Multiple regression analysis result",
            dep.var.labels = "Hand Raising",
            align = TRUE,
            ci = TRUE,
            df = TRUE,
            digits = 2,
            type = "html")
Multiple regression analysis result
Dependent variable:
Hand Raising
(1) (2) (3)
VisITedResources 0.44*** 0.42***
(0.38, 0.51) (0.33, 0.52)
AnnouncementsView 0.42*** 0.36***
(0.33, 0.50) (0.15, 0.57)
VisITedResources:AnnouncementsView 0.001 0.01***
(-0.002, 0.004) (0.01, 0.01)
Constant 6.64*** 7.74*** 23.90***
(2.96, 10.31) (2.47, 13.02) (21.05, 26.75)
Observations 480 480 480
R2 0.56 0.56 0.50
Adjusted R2 0.56 0.56 0.49
Residual Std. Error 20.41 (df = 477) 20.43 (df = 476) 21.88 (df = 478)
F Statistic 306.12*** (df = 2; 477) 203.90*** (df = 3; 476) 469.75*** (df = 1; 478)
Note: p<0.1; p<0.05; p<0.01

To indicate the result in standerdized scale transform the non-standardized values using the lm.beta package

library(lm.beta)

Create standardized versions from all objects

lm1_std <- lm.beta(lm1)
lm2_std <- lm.beta(lm2)
lm3_std <- lm.beta(lm3)

explicitly tell stargazer which coefficients we want to see

  stargazer(lm1_std,
            lm2_std,
            lm3_std,
            coef = list(lm1_std$standardized.coefficients,
                        lm2_std$standardized.coefficients,
                        lm3_std$standardized.coefficients),
                        
            title = "Result of multiple regression analysis(standerdized)",
            dep.var.labels = "Raising Hand",
            align = TRUE,
            ci = TRUE,
            df = TRUE,
            digits = 2,
            type = "html")
Result of multiple regression analysis(standerdized)
Dependent variable:
Raising Hand
(1) (2) (3)
VisITedResources 0.48*** 0.46***
(0.41, 0.55) (0.36, 0.55)
AnnouncementsView 0.36*** 0.31***
(0.27, 0.45) (0.10, 0.52)
VisITedResources:AnnouncementsView 0.07*** 0.70***
(0.06, 0.07) (0.70, 0.70)
Constant 0.00 0.00 0.00
(-3.67, 3.67) (-5.27, 5.27) (-2.85, 2.85)
Observations 480 480 480
R2 0.56 0.56 0.50
Adjusted R2 0.56 0.56 0.49
Residual Std. Error 20.41 (df = 477) 20.43 (df = 476) 21.88 (df = 478)
F Statistic 306.12*** (df = 2; 477) 203.90*** (df = 3; 476) 469.75*** (df = 1; 478)
Note: p<0.1; p<0.05; p<0.01

A multiple linear regression was calculated to predict hand raising based on visiting resources and viewing announcement. A significant regression equation was found (F (2, 477) = 306.12, p< .0001), with an R2 =0.56. Participants’ predicted hand raising is equal to 6.64 + 0.44 (visiting resource) + 0.42 (viewing announcement). Both resource visit and announcement view were significant predictors of hand raising frequency.

Conclusion

The main goal of this project was to examine the effect of applying Leading-Edge Technology in Education on students academic behavior. To achieve this goal different research questions were formulated and relevant statistical tests were employed. Base on the results it was concluded that:

1.Students academic performance gets better when the responsible parent for students’ education is mother, and when parents get satisfied with their children’s education.

2.Students hand raising behavior is associated with their sex. Female students are better in hand raising than their counterparts.

3.There is an association between announcement view and participation in discussion.

4.Visiting resources and viewing announcements are significant predictors of hand raising.